Search CORE

20 research outputs found

Few-shot classification in Named Entity Recognition Task

Author: Akhundov Adnan
Cotterell Ryan
Ma Yukun
Pradhan Sameer
Ravi Sachin
Snell Jake
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/12/2018
Field of study

For many natural language processing (NLP) tasks the amount of annotated data is limited. This urges a need to apply semi-supervised learning techniques, such as transfer learning or meta-learning. In this work we tackle Named Entity Recognition (NER) task using Prototypical Network - a metric learning technique. It learns intermediate representations of words which cluster well into named entity classes. This property of the model allows classifying words with extremely limited number of training examples, and can potentially be used as a zero-shot learning method. By coupling this technique with transfer learning we achieve well-performing classifiers trained on only 20 instances of a target class.Comment: In proceedings of the 34th ACM/SIGAPP Symposium on Applied Computin

arXiv.org e-Print Archive

Crossref

Distribution-Free Statistical Dispersion Control for Societal Applications

Author: Deng Zhun
Pitassi Toniann
Snell Jake C.
Zemel Richard
Zollo Thomas P.
Publication venue
Publication date: 24/09/2023
Field of study

Explicit finite-sample statistical guarantees on model performance are an important ingredient in responsible machine learning. Previous work has focused mainly on bounding either the expected loss of a predictor or the probability that an individual prediction will incur a loss value in a specified range. However, for many high-stakes applications, it is crucial to understand and control the dispersion of a loss distribution, or the extent to which different members of a population experience unequal effects of algorithmic decisions. We initiate the study of distribution-free control of statistical dispersion measures with societal implications and propose a simple yet flexible framework that allows us to handle a much richer class of statistical functionals beyond previous work. Our methods are verified through experiments in toxic comment detection, medical imaging, and film recommendation.Comment: Accepted by NeurIPS as spotlight (top 3% among submissions

arXiv.org e-Print Archive

Quantile Risk Control: A Flexible Framework for Bounding the Probability of High-Loss Predictions

Author: Deng Zhun
Pitassi Toniann
Snell Jake C.
Zemel Richard
Zollo Thomas P.
Publication venue
Publication date: 27/12/2022
Field of study

Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets.Comment: 24 pages, 4 figures. Code is available at https://github.com/jakesnell/quantile-risk-contro

arXiv.org e-Print Archive

Im-Promptu: In-Context Composition from Image Prompts

Author: Chang Michael
Dedhia Bhishma
Griffiths Thomas L.
Jha Niraj K.
Snell Jake C.
Publication venue
Publication date: 26/05/2023
Field of study

Large language models are few-shot learners that can solve diverse tasks from a handful of demonstrations. This implicit understanding of tasks suggests that the attention mechanisms over word tokens may play a role in analogical reasoning. In this work, we investigate whether analogical reasoning can enable in-context composition over composable elements of visual stimuli. First, we introduce a suite of three benchmarks to test the generalization properties of a visual in-context learner. We formalize the notion of an analogy-based in-context learner and use it to design a meta-learning framework called Im-Promptu. Whereas the requisite token granularity for language is well established, the appropriate compositional granularity for enabling in-context generalization in visual stimuli is usually unspecified. To this end, we use Im-Promptu to train multiple agents with different levels of compositionality, including vector representations, patch representations, and object slots. Our experiments reveal tradeoffs between extrapolation abilities and the degree of compositionality, with non-compositional representations extending learned composition rules to unseen domains but performing poorly on combinatorial tasks. Patch-based representations require patches to contain entire objects for robust extrapolation. At the same time, object-centric tokenizers coupled with a cross-attention module generate consistent and high-fidelity solutions, with these inductive biases being particularly crucial for compositional generalization. Lastly, we demonstrate a use case of Im-Promptu as an intuitive programming interface for image generation

arXiv.org e-Print Archive

Few-Shot Attribute Learning

Author: Lucas James
Pitkow Xaq
Ren Mengye
Snell Jake
Tolias Andreas S.
Triantafillou Eleni
Wang Kuan-Chieh
Zemel Richard
Publication venue
Publication date: 11/10/2021
Field of study

Semantic concepts are frequently defined by combinations of underlying attributes. As mappings from attributes to classes are often simple, attribute-based representations facilitate novel concept learning with zero or few examples. A significant limitation of existing attribute-based learning paradigms, such as zero-shot learning, is that the attributes are assumed to be known and fixed. In this work we study the rapid learning of attributes that were not previously labeled. Compared to standard few-shot learning of semantic classes, in which novel classes may be defined by attributes that were relevant at training time, learning new attributes imposes a stiffer challenge. We found that supervised learning with training attributes does not generalize well to new test attributes, whereas self-supervised pre-training brings significant improvement. We further experimented with random splits of the attribute space and found that predictability of test attributes provides an informative estimate of a model's generalization ability.Comment: Technical report, 25 page

arXiv.org e-Print Archive